First we create the following two files for each project:

$ git shortlog -ns > sympy-all.txt
$ git shortlog -ns --since="1 year ago" > sympy-year.txt

Then we load it up and create various plots. First we analyze the last year only.


In [6]:
%pylab inline
def get_data(filename):
    data = array([int(l.split()[0]) for l in open(filename).readlines()])
    return data


Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.

The linear tail on log-linear graph shows that each project has an exponential tail:


In [7]:
for project in ["sympy", "ipython", "numpy", "matplotlib

", "django"]:
    data = get_data("%s-year.txt" % project)
    #data = data / float(sum(data))
    #data = data / float(data[0])
    semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
xlim([0, 130]);


Including the linux kernel:


In [8]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
    data = get_data("%s-year.txt" % project)
    #data = data / float(sum(data))
    #data = data / float(data[0])
    semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
#xlim([0, 130]);


The same graph restricted to 130 people max:


In [9]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
    data = get_data("%s-year.txt" % project)
    #data = data / float(sum(data))
    #data = data / float(data[0])
    semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
xlim([0, 130]);


And 20 people:


In [10]:
for project in ["sympy", "ipython", "numpy", "matplotlib", "django", "linux"]:
    data = get_data("%s-year.txt" % project)
    #data = data / float(sum(data))
    #data = data / float(data[0])
    semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
xlim([0, 20]);


We can normalize the curves by the total number of patches:


In [11]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
    data = get_data("%s-year.txt" % project)
    data = data / float(sum(data))
    #data = data / float(data[0])
    semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("relative number of patches")
xlim([0, 130]);
ylim([1e-4, 1]);


Or by the most active contributor:


In [12]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
    data = get_data("%s-year.txt" % project)
    #data = data / float(sum(data))
    data = data / float(data[0])
    semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("number of patches relative to \nthe most active contributor")
xlim([0, 130]);
ylim([1e-4, 1]);



In [13]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
    data = get_data("%s-year.txt" % project)
    #data = data / float(sum(data))
    data = data / float(data[0])
    semilogy(data, lw=2, label="%s last year" % project)
legend()
grid()
xlabel("individual people")
ylabel("number of patches relative to \nthe most active contributor")
xlim([0, 20]);
ylim([1e-2, 1]);


Now we do the same graphs for all patches (not just the last year):


In [76]:
for project in ["sympy", "ipython", "numpy", "matplotlib", 'sklearn', 'pandas', 'scipy']:
    data = get_data("%s-all.txt" % project)
    #data = data / float(sum(data))
    #data = data / float(data[0])
    data = np.append(data, [0.55]*(300 - len(data)))
    semilogy(data, lw=2, label="%s all" % project)

legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
xlim([0, 300]);
ylim([0.6, 1e4]);
savefig("commits-all.pdf")



In [58]:
for project in ["sympy", "ipython", "numpy", "matplotlib", "django", "linux"]:
    data = get_data("%s-all.txt" % project)
    data = data / float(sum(data))
    #data = data / float(data[0])
    semilogy(data, lw=2, label="%s all" % project)
legend()
grid()
xlabel("individual people")
ylabel("total number of patches")
xlim([0, 130]);
ylim([1e-4, 1]);



In [11]:
for project in ["sympy", "ipython", "numpy", "mpl", "django", "linux"]:
    data = get_data("%s-all.txt" % project)
    #data = data / float(sum(data))
    data = data / float(data[0])
    semilogy(data, lw=2, label="%s all" % project)
legend()
grid()
xlabel("individual people")
ylabel("number of patches relative to \nthe most active contributor")
xlim([0, 130]);
ylim([1.5e-4, 1]);



In [75]:
for project in ["sympy", "ipython", "numpy", "matplotlib", "sklearn", "pandas", "scipy"]:
    data = get_data("%s-year.txt" % project)
    #data = data / float(sum(data))
    #data = data / float(data[0])
    plot(data, lw=2, label="%s last year" % project)
   
axhline(50, lw=1, color='k', linestyle='--')
legend()
grid()
xlabel("Individual committer")
ylabel("# of commits")
xlim([0, 25]);
#ylim([0, 1]);
savefig("commits1.pdf")



In [74]:
for project in ["sympy", "ipython", "numpy", "matplotlib", "sklearn", "pandas", "scipy"]:
    data = get_data("%s-year.txt" % project)
    #data = data / float(sum(data))
    data = data / float(data[0])
    plot(data, lw=2, label="%s last year" % project)
axhline(0.1, lw=1, color='k', linestyle='--')
legend()
grid()
xlabel("Individual committer")
ylabel("Commit rate")
xlim([0, 25]);
#ylim([0, 1]);
savefig("commits2.pdf")



In [ ]: